Skip to content

labs: entire investigate — multi-agent investigation loop#1231

Merged
alishakawaguchi merged 76 commits into
mainfrom
entire-labs-investigate
May 27, 2026
Merged

labs: entire investigate — multi-agent investigation loop#1231
alishakawaguchi merged 76 commits into
mainfrom
entire-labs-investigate

Conversation

@alishakawaguchi
Copy link
Copy Markdown
Contributor

@alishakawaguchi alishakawaguchi commented May 19, 2026

https://entire.io/gh/entireio/cli/trails/397

0520.mp4

Summary

  • Adds entire investigate (labs / hidden) — a round-robin multi-agent investigation loop that drives claude-code, codex, and gemini-cli through turns appending findings/evidence/stances to a shared findings doc until quorum / stalled / paused / cancelled.
  • Subcommands: fix [run-id], show [run-id], clean [run-id|--all].
  • Inputs: [seed-doc] positional, --issue-link <url> (gh-resolved with userinfo redaction + untrusted-content envelope), or — when the spawn-time picker fires — an in-picker "Investigation prompt".
  • Lifecycle: per-turn pending_turn state.json contract, env-var provenance handshake (ENTIRE_INVESTIGATE_*) adopted by UserPromptSubmit hook, condense into checkpoint metadata on commit, HasInvestigation umbrella flag on the checkpoint summary surfaced via entire status and the soft-warn guard.
  • Resume: --continue <run-id> reloads RunState, rewrites the manifest with the new terminal outcome (no longer leaves stale "paused" records).

Supporting refactors (review/investigate boundary)

  • New leaf packages, each consumed by both commands so the duplication is gone:
    • `cmd/entire/cli/tuiutil/` — width-aware text helpers + `FormatDuration`
    • `cmd/entire/cli/gitexec/` — `git` CLI runner (separates stdout from stderr-as-error-context)
    • `cmd/entire/cli/uiform/` — accessible huh form constructor + `PromptYN`
    • `cmd/entire/cli/provenance/` — single source of truth for ENTIRE_REVIEW_* / ENTIRE_INVESTIGATE_* env contract
  • `lifecycle.go`: `adoptReviewEnv` + `adoptInvestigateEnv` share `tryAdoptEnv(spec)`.
  • `investigate/show.go` and `clean.go` share `ResolveByRunID` for exact-then-prefix resolution.
  • `investigate` uses `jsonutil.WriteFileAtomic` / `checkpoint/id.Generate` / agent name constants instead of re-implementing.

Security hardening (from adversarial review)

  • `runGhExec` redacts URL userinfo (`https://user:TOKEN@github.com/...\`) before any arg reaches an error string — earlier credential-redaction only covered the seed-doc / log paths.
  • `--issue-link` now requires an interactive y/N before launching agents with permission/sandbox bypass; non-interactive callers see the warning on stderr and proceed (CI / scripted use is not hard-blocked).
  • Issue body + comments are wrapped in a `<untrusted source="...">` envelope and any literal `` inside the body is defanged with a zero-width space.

Test plan

  • `mise run fmt && mise run lint` clean
  • `go test ./cmd/entire/cli/...` all green
  • Manual smoke: `entire investigate <seed.md>` (single agent, multi-agent picker)
  • Manual smoke: `entire investigate --issue-link ` (accept + decline branches)
  • Manual smoke: pause (Ctrl+C) → `entire investigate --continue ` → reaches quorum → manifest rewritten with new outcome
  • Manual smoke: `entire investigate fix `, `show `, `clean ` / `clean --all`

🤖 Generated with Claude Code


Note

High Risk
High risk due to introducing a new command that spawns external agent processes with sandbox/permission-bypass flags, adds deletion functionality (investigate clean), and extends checkpoint metadata/wire formats (HasInvestigation + investigate fields) that affect on-disk persisted data.

Overview
Adds a hidden labs entire investigate command that runs a round-robin multi-agent loop (with resume via --continue), bootstraps a per-run findings doc (from seed doc, issue link, or picker prompt), persists run state/manifests, and provides fix, show, and clean subcommands (including confirmed deletion of saved investigation artifacts).

Introduces a shared spawn.Spawner interface plus concrete spawners for claude-code, codex, and gemini-cli (using non-interactive invocation and permission/sandbox-bypass flags), and adds agentlaunch.LaunchFixAgent to start follow-up “fix” sessions while stripping review/investigate provenance env.

Extends checkpoint/session metadata to record investigation provenance (investigate_* fields) and a HasInvestigation umbrella flag, propagates it through v2 checkpoint summary merge logic and entire explain --json export, and adds tests (unit + integration) pinning the new on-disk/JSON wire formats and adoption via ENTIRE_INVESTIGATE_* env vars.

Reviewed by Cursor Bugbot for commit f8edb81. Configure here.

alishakawaguchi and others added 30 commits May 8, 2026 23:00
Ships `entire investigate` as a hidden top-level command surfaced through
`entire labs`. Runs a non-TUI marvin-style round-robin loop across
launchable, hook-enabled agents (claude-code, codex, gemini-cli),
bootstraps a findings doc + timeline, and tags each spawned session with
a new agent_investigate Kind so the next commit condenses the
investigation onto entire/checkpoints/v1 alongside review runs.

GitHub issue/PR URLs seed the loop through gh; a local manifest plus an
`entire investigate fix` helper feeds accepted findings into a follow-up
coding session.

Investigate gets its own HasInvestigation umbrella flag and
Kind.IsInvestigate() predicate rather than reusing review's umbrella, so
the feature stays semantically distinct.

Includes a small spawner refactor: extract a shared Spawner interface
under cmd/entire/cli/agent/spawn/ so review and investigate share the
per-agent argv builders. Review argv is byte-identical post-refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 5d9432613a5d
- runContinue now reloads settings.Investigate.AlwaysPrompt so
  --continue preserves the user's configured preamble. Previously a
  Ctrl+C plus resume silently dropped it.
- ParseStanceFromTimeline's "headingFound" return is now consumed:
  an agent that exits 0 but writes no turn heading counts as a soft
  failure, so two consecutive missing-heading turns trip pause-on-
  failure rather than burning the budget silently.
- adoptInvestigateEnv validates EnvRunID via investigate.IsValidRunID
  before tagging state. An empty or non-12-hex run ID is rejected
  with a logged warning so junk run IDs cannot leak into checkpoint
  metadata.
- ResolveIssueLink uses url.Redacted() for log/seed-doc URLs so a
  basic-auth credential embedded in --issue-link never reaches stderr,
  logs, or the findings doc.
- Banner prints topic via %q to neutralize ANSI escapes.
- openTurnLog documents that concurrent --continue on the same run id
  is not supported (single-shell continue only).

Tests added for each fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 6fce1174a808
1. Prompt injection via --issue-link: wrap untrusted issue/PR body and
   each comment in a labeled <untrusted source="..."> envelope so a
   well-aligned agent treats the content as data. Defang any literal
   close-tag inside the body so the envelope is not breakable. Add a
   regression test that pins both the wrapper and the defang.
2. Resume crash on shrunken --agents: refuse rather than panic when
   persisted NextAgentIdx exceeds the (possibly overridden) agent list.
   Surface an actionable error pointing at the state file.
3. Per-turn log unbounded growth: wrap the log file in a 16 MiB-capped
   writer that drops the tail and emits a single truncation marker. The
   capped writer reports len(p) so exec.Cmd never sees a short-write
   teardown signal. Verbose tee output remains uncapped (terminal flow
   control bounds it).
4. Settings load failure on --continue silently dropping AlwaysPrompt:
   surface a visible warning when the settings file is broken on resume,
   so the user notices their preamble has disappeared instead of seeing
   unexplained agent behaviour change.
5. RunState.Round vs TurnStance.Round semantic conflict: rename
   RunState.Round to CompletedRounds (0-indexed completed-pass count) so
   it is clearly distinct from TurnStance.Round (1-indexed
   round-this-turn-belongs-to). Document both fields.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 3d52e957d3cc
The investigate loop now drives a Bubble Tea dashboard modelled on
`entire review`, with one row per agent showing AGENT / STATUS /
DURATION / TURN / APPROVED. Progress is delivered through a small
`ProgressSink` interface so the loop itself stays free of any TUI
dependency: TTY runs get the dashboard, non-TTY runs (CI, redirected
stdout, agent-host invocations) get the same two-line shape today's
log output produces.

The `--verbose` flag is removed — per-turn agent stdout still lands on
disk at `<git-common-dir>/entire-investigations/transcripts/<run-id>/
turn-N-<agent>.log`, which is the canonical place to inspect raw
output post-hoc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: df0fb75a3046
Mirrors the review-side change to keep agent picker output out of the
committed project settings. Reads/writes only the local file so other
settings fields are preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: cd436b111b04
Reflects the storage change in fix(investigate): store config in
.entire/settings.local.json.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 4e825ed88af2
Prompts the user on the next `entire investigate` invocation to move
the investigate config out of committed project settings and into
.entire/settings.local.json. Non-interactive runs print guidance and
continue. Existing local investigate config is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8349c994a7eb
The migration runs before any --edit / --findings / --continue dispatch
so users see the move-to-local prompt on the next invocation regardless
of which subcommand they reach for.

Also harden loadProjectInvestigateSettings to fail open on malformed
project settings JSON: the user already sees a parse error from
settings.Load downstream, and blocking the migration prompt on bad JSON
would make `entire investigate` unusable in the exact situation it
exists to help recover from.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: ceee2ec8b984
Symmetric to TestHeadHasReviewCheckpoint_WrapperPreservesContract: when
the checkpoint at HEAD has HasReview=true but HasInvestigation=false,
headHasInvestigateCheckpoint must return false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 6cf6b7a5a660
Reads CheckpointSummary.HasInvestigation at HEAD via the existing
headHasInvestigateCheckpoint wrapper and prompts the user (default Yes)
before launching another run. Skipped for --edit and --findings modes
and for non-interactive callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: bda6b287cd53
Returns the user's per-run agent selection plus an optional per-run
prompt textarea. Mirrors review/multipicker.go. Wiring into cmd.go
follows in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 6e239f58bc02
When 2+ agents are configured and --agents is not set, prompt the user
for a per-run agent subset and an optional preamble. The preamble is
joined onto AlwaysPrompt for this run only; settings are not modified.
Mirrors review's multipicker UX.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 5d5bb9154962
Adds a per-agent buffer of timelineEntry rows fed by turnStartedMsg and
turnFinishedMsg. Lays the groundwork for the Ctrl+O drill-in view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 69124ca4e131
Extends ProgressSink.TurnFinished with a preview string parsed as the
first non-empty non-stance line of the turn block. Consumed by the
TUI sink and surfaced in the upcoming drill-in detail view.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 29a8946fad70
Renders an agent's timeline buffer for the drill-in view. Output is
padded to exactly termHeight lines to avoid Bubble Tea alt-screen
ghost rows. Wiring into Update/View follows in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 4bbb6700b1ba
Adds detail-mode state to the TUI model and routes keyboard + mouse
wheel events to scroll the per-agent timeline buffer. Esc returns to
the dashboard; ←/→ cycle agents; ↑/↓ scroll. Detail mode toggles
AltScreen and MouseMode on the rendered tea.View — mirrors the review
package's pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: a4d580798d15
- picker.go:183 — corrected post-save line to point at .entire/settings.local.json
- cmd.go runContinue — wire spawn-time multipicker on resume when persisted state has 2+ agents; thread perRun into AlwaysPrompt via composeAlwaysPrompt
- cmd.go soft-warn — emit logging.Info "running anyway (non-interactive)" when the user can't be prompted
- multipicker.go — extract sortAgentChoices helper for testability
- tests — add 3 missing spec-mandated tests: ResultSortedAlphabetically, PerRunPromptOptional, SoftWarnSilentInNonInteractive

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: d9fc3447ba6e
runInvestigate never called logging.Init, so every `logging.Info(...)`
inside the loop hit slog.Default() — which writes a plain-text line to
stderr. During a TUI run those stderr lines interleave with the
dashboard redraw and produce visible garbage like

    2026/05/12 15:52:00 INFO investigate: turn end ... turn=⡿

(the ⡿ is a spinner frame from the alt-screen leaking into the log
line). Call logging.Init at the top of runInvestigate and defer Close,
mirroring every other long-running command (attach, migrate, resume,
hooks_git_cmd). Logs now land in .entire/logs/entire.log only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 2d9dc05d866b
Claude Code's `-p` (non-interactive) mode silently denies Write/Edit
tool calls when no permission flag is set — there's no UI to answer
the permission prompt. This made `entire investigate` unusable with
claude-code: every turn the agent did its analysis but couldn't write
the `## Turn N — claude-code` heading to the timeline doc, so the loop
counted the turn as a soft failure and marked the agent ✗ failed.

Add --permission-mode acceptEdits to the shared spawner argv. The flag
auto-accepts edits to files the agent decides to write; it does not
itself trigger writes. Review doesn't write files in practice, so the
flag is a no-op for the review path — only investigate's behaviour
changes.

Tests update the pinned argv contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: c694e8346827
The previous stub captured \$2, which used to be the prompt under
\`claude -p <prompt>\`. After 98df345 added --permission-mode
acceptEdits, the argv shape became \`claude -p --permission-mode
acceptEdits <prompt>\`, so \$2 captured the flag name. Iterate to grab
the last positional so the stub survives future argv shifts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 089d400d1133
The directive was stripped by gofmt during the Tier 2 work — it sat on
its own comment line above the function and got separated from the
signature by a blank line, making golangci-lint disregard it. Restore
the directive directly above 'func' so lint:go is clean again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 5ca4375d0ebf
The three lines

  Investigating: "<topic>" (run <id>)
    Findings: <path>
    Timeline: <path>

were left visible above the live dashboard in TTY mode and duplicated
the TUI's own title row — the dashboard then redrew below them,
producing the noisy "stale rows above the live screen" effect the
user reported. Skip the banner when the TUI will render. Non-TTY mode
keeps the banner since the text sink doesn't surface those paths.
Same change applies to runContinue's "Resuming investigation:" line.

Also add the investigate/cmd.go path to the .golangci.yaml ireturn
exclusion list: golangci-lint --fix kept stripping the nolint
directive on buildProgressSink during fmt, and a per-path exclusion
mirrors the existing review/tui_model.go entry which has the same
abstract-sink-plus-concrete-handle return contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: f1a08334c505
single-select picker on --findings

Two UX regressions from the Tier 2 work:

1. --continue resumes a paused run. The persisted state already captures
   the user's agent selection from the original run, so reopening the
   multipicker on every resume is friction. Worse, if the user
   accidentally deselects the agent whose turn is next, the resume
   refuses to proceed (NextAgentIdx exceeds list length). Trust the
   persisted state; --agents <csv> still narrows on resume for the
   rare case it's needed.

2. --findings reaches for the picker in TTY mode and shows just the
   chosen manifest's detail. Users who type --findings want to see all
   runs, not pick one — the `fix:` hint on each row gives them the
   next step. Always print the full list. Remove the now-dead
   promptForInvestigateManifest and printInvestigateManifestDetail
   helpers and their no-longer-needed imports.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: a0e16fb297cf
Each investigation turn already runs as a tagged agent session
(Kind=agent_investigate) whose full transcript is condensed onto
entire/checkpoints/v1 on commit — same machinery as review. The
.log files written to <git-common-dir>/entire-investigations/transcripts/
duplicate that. Drop the capture entirely; route agent stdout/stderr
to io.Discard. Removes ~150 lines of bounded-writer/log-rotation code
and the LoopDeps.TranscriptDir field.

Existing on-disk .log dirs aren't cleaned up by this change — they're
harmless and the user can rm them when convenient (or via a future
`entire investigate rm` subcommand).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8f3b007365b7
Two investigations on the same topic used to write to the same
.entire/investigations/<slug>.md + <slug>-timeline.md, stomping each
other's findings. New layout puts each run under its own subdir:

  .entire/investigations/<run-id>-<slug>/findings.md
  .entire/investigations/<run-id>-<slug>/timeline.md

--output still overrides verbatim. Legacy on-disk investigations keep
working because state + manifest persist the file paths directly; only
new runs use the new layout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 7776d7ccdfc7
Per-run artefacts now live at:

    <git-common-dir>/entire-investigations/<run-id>/
        findings.md   <- the collaborative findings doc
        state.json    <- cursors + stance history + pending_turn

timeline.md is gone. The agent reports its stance by setting a
`pending_turn` field in state.json after editing findings.md; the loop
reads it after the agent exits, appends to stances[], and clears the
field. ParseStanceFromTimeline, findTurnBlock, normaliseStance, and the
turn-block regex globals are removed.

Bootstrap no longer creates a timeline file. The --output flag is
removed (escape hatch can be reintroduced if needed). The
ENTIRE_INVESTIGATE_TIMELINE_DOC env var is removed;
ENTIRE_INVESTIGATE_STATE_DOC is added so the agent can locate state.json.

The TUI "findings" preview that D2 introduced now feeds the agent's
pending_turn note straight through (parseFindingsPreview /
readTimelineFile / findTurnBlock are all gone).

State files move from <git-common-dir>/entire-investigations/state/
<run-id>.json to <git-common-dir>/entire-investigations/<run-id>/
state.json, alongside findings.md. The List path walks subdirs instead
of *.json. Old state files (and the per-run dir under
.entire/investigations/) become orphaned; this is accepted in the
redesign.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 9c6e606a9cf6
The findings doc is now a single converged answer the agents edit in
place each turn, not a chronological log of attempts. New structure:

    ## Current understanding   <- the team's best answer right now
    ## Supporting evidence     <- claims tied to concrete refs
    ## Disputed / unverified   <- what isn't yet confirmed

No more numbered findings, no per-turn attribution in the doc, no
Approach / Conclusion / Recommendations sections. Provenance for "who
changed what" lives in the agent session transcripts on
entire/checkpoints/v1 (recoverable via `entire checkpoint explain`),
not in the doc itself.

The agent prompt is rewritten to direct each agent to read, verify,
and edit -- not append. Stance still reports via state.json's
pending_turn field (unchanged from R1). Testdata prompt snapshots
regenerated; bootstrap scaffold updated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8465aba45120
On terminal outcomes (Quorum/Stalled) the loop now reads the final
findings.md content into the manifest's new findings_content field,
then RemoveAlls the per-run directory <git-common-dir>/
entire-investigations/<run-id>/. Findings survive in the manifest
(parallel to review's AggregateOutput); the file is gone but the
content is recoverable via `entire investigate --findings` or a
follow-up `entire investigate show <run-id>` command.

Paused/Cancelled runs keep the per-run dir untouched so --continue
works and the user can read findings while a run is suspended.

The --findings list prints "<captured in manifest>" instead of a
file path when the run is cleaned up, so users know the content is
still accessible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: fb498711831a
Print a saved investigation's summary + findings without needing the
per-run directory (which is auto-cleaned on Quorum/Stalled by R3).
Findings come from the manifest's embedded findings_content for
terminal outcomes, or from the on-disk findings.md for paused/
cancelled runs.

Resolution accepts a full run id, a unique prefix, or no argument when
there is exactly one manifest. Multiple-candidate cases print a
candidate list rather than guessing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 343a9cb50805
alishakawaguchi and others added 5 commits May 19, 2026 12:18
The migration was scaffolded to move an "investigate" key from
project settings (.entire/settings.json) to local settings
(.entire/settings.local.json), prompting the user on every
`entire investigate` invocation while the legacy key existed. Since
the investigate feature has not shipped, no project anywhere has the
legacy key — the migration is purely vestigial overhead on the cold
path.

Removed:
- cmd/entire/cli/investigate/migration.go and migration_test.go
- The maybePromptInvestigateSettingsMigration call in cmd.go's RunE
  (and the per-command PromptYN/canPrompt setup it required).
- TestNewCommand_RunsMigrationBeforeDispatch.
- settings.SaveLocalRaw — added in the prior commit only to give the
  investigate migration a typed write path; no other callers.

Deps.PromptYN and the realPromptYN wrapper stay — they are still used
by the HEAD-soft-warn ("a checkpoint at HEAD already has
HasInvestigation set; run again?").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 98dac7e8cb76
runGhExec formatted strings.Join(args, " ") directly into its wrapped
error. When --issue-link carries a credential
(https://user:TOKEN@github.com/...), gh failure surfaces TOKEN through
stderr and .entire/logs/.

ResolveIssueLink already redacted the URL for its own log paths via
url.URL.Redacted(), but the runGhExec error path was missed. The earlier
TestResolveIssueLink_RedactsCredentialsInErrors stubbed runGhFn entirely,
bypassing runGhExec's formatting code.

Add a redactArgsForError helper that maps each arg through
url.URL.Redacted() when it parses as a URL with userinfo; non-URL args
pass through. Cover both the helper and the leaf redactURLUserinfo with
unit tests that exercise the production format path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 4d2544eb5a3c
…ed issue seeds

Two fixes from the Codex adversarial review of this branch.

1. runContinue never updated the saved manifest after a resumed run
   reached quorum/stalled. The fresh path routes through
   executeLoopAndCapture + writeRunManifest, but --continue used a
   thin executeLoop wrapper that discarded the LoopResult. After a
   paused -> quorum continuation, `entire investigate show / --findings
   / fix` saw the stale "paused" outcome with empty FindingsContent.

   Replace the wrapper call with executeLoopAndCapture + writeRunManifest.
   Reusing state.StartedAt keeps the manifest filename stable
   (<stamp>-<runID>.json) so the new write overwrites the paused record
   in place rather than creating a duplicate. WorktreePath isn't on
   RunState, so re-resolve via paths.WorktreeRoot — failure leaves the
   manifest written with an empty path rather than blocking the rewrite.

   Drop the now-unused executeLoop wrapper.

   New regression test: TestNewCommand_ContinueWritesTerminalManifest.

2. --issue-link feeds external GitHub content (issue body + comments)
   into agents that spawn with permission/sandbox bypass
   (claude-code --permission-mode bypassPermissions,
   codex --dangerously-bypass-approvals-and-sandbox). A malicious issue
   or comment can influence agent behaviour; the <untrusted> XML
   envelope is a prompt convention, not a real isolation boundary.

   Add a confirmation gate that fires right after resolveTopicAndSeed
   when issueSeed is non-empty. Interactive: prints the warning to
   stderr (including the source URL) and prompts y/N with default N;
   decline returns cleanly. Non-interactive: prints the warning and
   proceeds — CI / scripted callers passed --issue-link deliberately,
   and hard-blocking would break automation; the risk surfaces in
   operator-facing telemetry.

   New regression tests:
   - TestConfirmUntrustedIssueSeed_DeclinedExitsCleanly
   - TestConfirmUntrustedIssueSeed_AcceptedReturnsOK
   - TestConfirmUntrustedIssueSeed_PromptError

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 404f400adb7b
Conflict resolution: cmd/entire/cli/review_helpers.go::headCheckpointFlags.
Keep this branch's triple-return shape (hasReview, hasInvestigation,
info) — both review and investigate consume it via thin wrappers — but
adopt main's new reader path: newCommittedCheckpointReader +
checkpoint.ReadCommittedCheckpoint, which routes through the
configured v1/v2 store mix. Main had collapsed the function to
review-only and used the new reader; this branch had the triple-return
but the older ResolveCommittedReaderForCheckpoint call. The merged
shape preserves both improvements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: e7b3e33f1058
…references

Mechanical comment sweep across the 30 new source files on this branch.
Removed:
- "Mirrors review/X" / "near-copy of" cross-package analogies
- "Migrated from review.go" / "Kept here because..." breadcrumbs
- File-header boilerplate that just named the file's role
- Narrative "we deliberately ..." / past-state references
- WHAT-restatements of well-named identifiers

Kept the load-bearing WHY: security constraints, sentinel-value docs,
go-git workaround notes, idempotence + threading invariants, and the
godoc-required package comment on env.go.

No code changes; build, vet, lint, and tests all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 463230cd5723
Copilot AI review requested due to automatic review settings May 19, 2026 23:16
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit f8edb81. Configure here.

Comment thread cmd/entire/cli/investigate/loop.go
Comment thread cmd/entire/cli/investigate/cmd.go
Comment thread cmd/entire/cli/investigate/loop.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds hidden labs support for entire investigate, a resumable multi-agent investigation loop, and refactors shared review/investigation utilities for prompts, provenance, git execution, TUI formatting, and fix-agent launching.

Changes:

  • Introduces investigation run state, manifests, loop execution, agent spawning, GitHub issue seeding, TUI/text progress, and fix/show/clean subcommands.
  • Adds investigation metadata propagation into session state, checkpoint summaries, status output, and explain JSON export.
  • Extracts shared utilities for accessible forms, display formatting, provenance env vars, git command execution, and fix-agent launching.

Reviewed changes

Copilot reviewed 88 out of 88 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
.golangci.yaml Updates lint allowances for new interfaces and investigate command code.
cmd/entire/cli/agent/architecture_test.go Excludes shared spawn package from agent package discovery.
cmd/entire/cli/agent/claudecode/spawner.go Adds Claude Code non-interactive spawner.
cmd/entire/cli/agent/claudecode/spawner_test.go Tests Claude Code spawner command shape.
cmd/entire/cli/agent/codex/spawner.go Adds Codex non-interactive spawner.
cmd/entire/cli/agent/codex/spawner_test.go Tests Codex spawner command shape.
cmd/entire/cli/agent/geminicli/spawner.go Adds Gemini CLI non-interactive spawner.
cmd/entire/cli/agent/geminicli/spawner_test.go Tests Gemini CLI spawner command shape.
cmd/entire/cli/agent/spawn/spawn.go Defines shared spawner interface.
cmd/entire/cli/agentlaunch/launch.go Adds shared fix-agent launcher.
cmd/entire/cli/agentlaunch/launch_test.go Tests provenance env stripping for fix-agent launches.
cmd/entire/cli/attach.go Adds reusable review session tagging helper.
cmd/entire/cli/checkpoint/checkpoint.go Adds investigation fields to checkpoint metadata types.
cmd/entire/cli/checkpoint/committed.go Writes and preserves investigation metadata in committed checkpoints.
cmd/entire/cli/explain_export.go Exposes investigation flags/fields in explain JSON export.
cmd/entire/cli/explain_export_test.go Tests investigation JSON export behavior.
cmd/entire/cli/gitexec/gitexec.go Adds shared git CLI execution helper.
cmd/entire/cli/head_checkpoint_flags_test.go Tests HEAD review/investigation checkpoint flag helpers.
cmd/entire/cli/investigate/bootstrap.go Builds initial investigation findings documents.
cmd/entire/cli/investigate/clean.go Implements investigation cleanup.
cmd/entire/cli/investigate/cmd.go Wires main investigate command flow and subcommands.
cmd/entire/cli/investigate/cmd_internal_test.go Tests internal command helpers.
cmd/entire/cli/investigate/env.go Defines investigate provenance env contract.
cmd/entire/cli/investigate/env_test.go Tests investigate env handling.
cmd/entire/cli/investigate/findings.go Implements local findings listing.
cmd/entire/cli/investigate/findings_test.go Tests findings list output.
cmd/entire/cli/investigate/fix.go Implements investigate fix prompt/launch flow.
cmd/entire/cli/investigate/issuelink.go Resolves GitHub issue/PR URLs into investigation seed docs.
cmd/entire/cli/investigate/loop.go Implements round-robin investigation loop.
cmd/entire/cli/investigate/manifest.go Persists and resolves local investigation manifests.
cmd/entire/cli/investigate/manifest_test.go Tests manifest persistence/resolution.
cmd/entire/cli/investigate/multipicker.go Adds spawn-time multi-agent picker.
cmd/entire/cli/investigate/multipicker_test.go Tests picker helper behavior.
cmd/entire/cli/investigate/picker.go Adds first-run investigate config picker.
cmd/entire/cli/investigate/picker_test.go Tests investigate config picker behavior.
cmd/entire/cli/investigate/progress.go Adds progress sink interfaces and text sink.
cmd/entire/cli/investigate/progress_test.go Tests text/null progress sinks.
cmd/entire/cli/investigate/prompt.go Composes per-turn investigation prompts.
cmd/entire/cli/investigate/prompt_yn.go Shares accessible y/N prompt.
cmd/entire/cli/investigate/prompt_test.go Golden-tests investigation prompts.
cmd/entire/cli/investigate/show.go Implements saved investigation display.
cmd/entire/cli/investigate/show_test.go Tests show command logic.
cmd/entire/cli/investigate/state.go Adds persisted run state store.
cmd/entire/cli/investigate/testdata/prompt-first-round.txt Adds prompt golden file.
cmd/entire/cli/investigate/testdata/prompt-mid-loop.txt Adds prompt golden file.
cmd/entire/cli/investigate/testdata/prompt-with-always.txt Adds prompt golden file.
cmd/entire/cli/investigate/tui_detail.go Adds TUI detail view rendering.
cmd/entire/cli/investigate/tui_detail_test.go Tests TUI detail rendering.
cmd/entire/cli/investigate/tui_sink.go Adds Bubble Tea progress sink.
cmd/entire/cli/investigate/tui_text.go Adapts shared TUI text utilities.
cmd/entire/cli/investigate_bridge.go Wires investigate deps from the CLI package.
cmd/entire/cli/investigate_bridge_test.go Tests investigate root/bridge wiring.
cmd/entire/cli/labs.go Lists investigate in labs overview.
cmd/entire/cli/lifecycle.go Adopts investigate provenance env into session state.
cmd/entire/cli/provenance/env.go Centralizes review/investigate env var names.
cmd/entire/cli/review/cmd.go Uses shared gitexec HEAD helper.
cmd/entire/cli/review/env.go Aliases review env names through provenance package.
cmd/entire/cli/review/fix.go Uses shared fix-agent launcher.
cmd/entire/cli/review/picker.go Uses shared accessible form helper.
cmd/entire/cli/review/scope.go Uses shared gitexec runner.
cmd/entire/cli/review/synthesis_sink.go Uses shared y/N prompt helper.
cmd/entire/cli/review/tui_model.go Uses shared duration formatting.
cmd/entire/cli/review/tui_text.go Uses shared TUI text helpers.
cmd/entire/cli/review_helpers.go Adds shared HEAD checkpoint flag resolution.
cmd/entire/cli/root.go Registers hidden investigate command.
cmd/entire/cli/session/state.go Adds investigate session kind and fields.
cmd/entire/cli/session/state_test.go Tests investigate session state serialization.
cmd/entire/cli/settings/settings.go Adds investigate settings schema/merge support.
cmd/entire/cli/settings/settings_test.go Tests investigate settings behavior.
cmd/entire/cli/status.go Displays investigation status for HEAD checkpoints.
cmd/entire/cli/status_test.go Tests investigation status output.
cmd/entire/cli/strategy/manual_commit_condensation.go Propagates investigate metadata during condensation.
cmd/entire/cli/strategy/manual_commit_condensation_test.go Tests condensation of investigation metadata.
cmd/entire/cli/tuiutil/display.go Adds shared display-width and duration helpers.
cmd/entire/cli/uiform/uiform.go Adds shared accessible huh form/prompt helpers.
cmd/entire/cli/utils.go Delegates form/accessibility helpers to uiform.
Comments suppressed due to low confidence (1)

cmd/entire/cli/investigate/cmd.go:402

  • This interactive warning also echoes the raw issue URL, which can contain userinfo credentials. Use the redacted URL form for operator-facing output so tokens embedded in --issue-link are not leaked to the terminal or captured logs.
	fmt.Fprintln(cmd.ErrOrStderr(), warning)
	fmt.Fprintf(cmd.ErrOrStderr(), "Source: %s\n", issueLink)

Comment thread cmd/entire/cli/investigate/cmd.go Outdated
Comment thread cmd/entire/cli/investigate/cmd.go Outdated
Comment thread cmd/entire/cli/investigate/cmd.go Outdated
Comment thread cmd/entire/cli/investigate/cmd.go
Comment thread cmd/entire/cli/settings/settings.go Outdated
Comment thread cmd/entire/cli/investigate/issuelink.go
Comment thread cmd/entire/cli/investigate/loop.go
Comment thread cmd/entire/cli/investigate/picker.go Outdated
Comment thread cmd/entire/cli/investigate/issuelink.go
Comment thread cmd/entire/cli/investigate/picker.go Outdated
alishakawaguchi and others added 6 commits May 19, 2026 17:31
The round, turn, and prompt coordinates on investigation sessions were
audit-only with no consumers outside their own round-trip tests; the loop
still tracks round/turn internally to render "Round X of Y" prompts but
nothing downstream needs the persisted copies. Drop them from explain
export, CommittedMetadata, session.State, the env-var contract, and the
adoption + condensation paths. Only investigate_run_id and
investigate_topic survive — enough to attribute a checkpoint to a run and
display the topic.

Also drop AttachSession + AttachOptions from attach.go: dead code left
behind when `entire investigate attach` was removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: aedacb2a168e
TestNullProgressSink_ImplementsInterface ended with a
strings.HasPrefix(string(OutcomeQuorum), "qu") sanity check that could
never fail. Replace the runtime interface assertion with a package-level
var _ ProgressSink = nullProgressSink{} declaration (compile-time guard)
and rename the function to TestNullProgressSink_NoPanic so it honestly
describes what it verifies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 6c5e94c73c6b
Correctness:
- clean.go: delete run dir before manifest so a failed cleanup leaves a
  recoverable manifest breadcrumb
- show.go / fix.go: reject relative FindingsDoc at read time; commit
  manifest docstring to absolute paths only
- loop.go: bump state-save failure to Warn; fileFingerprint now uses
  size+sha256 to catch sub-second same-length edits; classifyRunErr
  distinguishes spawn vs non-zero exit; remove redundant slogString
  wrappers
- state.go: warn instead of swallowing unreadable state.json in List
- tui_sink.go: Start(ctx) ctx-watcher pushes tea.Quit on cancel so Wait
  unblocks on early loop return; flip defer order in cmd.go so cancelTUI
  fires before Wait
- tui_model.go: preserve rowStatusQueued for agents that never ran; Ctrl+C
  cancels from detail mode too (matches footer)
- manifest.go: tighten file mode 0o644 → 0o600 (matches state.go)

Security:
- fix.go: wrap investigation prompt and findings body in <untrusted>
  envelopes with defanged close-tags so prior-agent-ingested untrusted
  seed content cannot inject instructions into the fix prompt
- review/env.go: AppendReviewEnv now strips both ENTIRE_REVIEW_* AND
  ENTIRE_INVESTIGATE_* (symmetric to AppendInvestigateEnv)
- lifecycle.go: doc note on the env+agent+SHA adoption trust model

Layering:
- provenance/env.go: own IsValidRunID (uses checkpoint/id.Pattern)
- lifecycle.go: drop investigate import; use provenance.* directly

Tests:
- new TestAdoptInvestigateEnv_TagsSessionViaHandleLifecycleTurnStart
  mirrors the review side
- new TestLaunchFixAgent_EmptyEnvFallback_StripsHostProvenance covers the
  cmd.Env==nil → os.Environ() branch
- new tui_sink_test.go covers the ctx-cancel-unblocks-Wait contract
- updated fix/loop/tui_model tests to match new behavior

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b4aa8d29fb64
CI lint:
- escape U+200B literals (staticcheck ST1018)

Real bugs (cursor[bot] + Copilot review):
- loop: ctx-cancel mid-cmd.Run() now classifies as OutcomeCancelled
  instead of being counted as a turn failure and tripping OutcomePaused
  after two consecutive cancels (#1)
- loop: save state before returning OutcomePaused so --continue resumes
  from a snapshot that includes the failing turn (#11)
- investigate fix: wrap context.Canceled as SilentError so Ctrl+C during
  the fix session doesn't print a cobra usage banner (#2)
- cmd: redact URL userinfo on the issue-link Source: line in both
  interactive and non-interactive paths (#5)
- issuelink: redact URL userinfo across gh stderr (not just argv) so a
  token in --issue-link cannot leak via the error path (#10)
- cmd: outcome-aware footer — "Investigation ended" + resume hint for
  paused/cancelled, "Investigation complete" + fix hint only for
  Quorum/Stalled (#6)
- cmd: validate maxTurns/quorum bounds after settings/flag merge so a
  hand-edited negative max_turns or oversized quorum errors cleanly
  instead of silently stalling (#7)
- issuelink: tolerate GitHub URL trailing segments (/pull/123/files,
  trailing slash) — the regex now anchors prefix and ignores tail (#13)
- picker: don't print "Saved investigate config" before persistence;
  moved to the caller after SaveLocal succeeds (#14)
- picker: guard pickerFormOverride with atomic.Pointer so parallel
  tests that swap the override don't race (#12)

Docs:
- settings/loop: fix stale "0 → 3" max_turns doc; default is 2 (#8/#9)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b0135abeee0d
…sume hints

`entire investigate --findings` previously printed the literal string
`<captured in manifest>` for terminal-outcome runs whose findings live
inside the manifest JSON. That's a placeholder, not a path — viewers had
no obvious way to actually read the findings without knowing about
`entire investigate show`.

New listing format:

  <run-id> · <topic> · <agents> · <when>
    view:    entire investigate show <run-id>     (every row, always)
    fix:     entire investigate fix <run-id>      (terminal outcomes only)
    resume:  entire investigate --continue <id>   (paused/cancelled)
    path:    <findings.md path>                   (only when on-disk file still exists)

The `view:` line points at the show subcommand, which works regardless
of where findings live; `fix` is only suggested for terminal outcomes
since paused/cancelled runs have incomplete findings; the on-disk path
is only printed when it points at an extant file (terminal outcomes
auto-clean the per-run dir, so the prior code's stale path was already
suppressed there).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 23da2a4fc345
Resolves conflict in cmd/entire/cli/review_helpers.go: keeps the branch's
`headCheckpointFlags` 3-tuple return (HasReview, HasInvestigation, info)
needed by the investigate side, while switching to main's new
checkpoint.NewCommittedReader(ctx, repo, CommittedReaderOptions{})
signature so the v1/v2 read selection is handled inside the checkpoint
package.

Also fixes two test sites that drifted from main's NewV2GitStore
signature (now takes just *git.Repository, no remote-name argument):
head_checkpoint_flags_test.go and status_test.go.

go.mod: promotes github.com/atotto/clipboard to a direct dependency per
`go mod tidy` (it's used directly in this branch's code).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 064894d05fa7
@alishakawaguchi alishakawaguchi self-assigned this May 20, 2026
@alishakawaguchi alishakawaguchi marked this pull request as ready for review May 20, 2026 21:07
@alishakawaguchi alishakawaguchi requested a review from a team as a code owner May 20, 2026 21:07
alishakawaguchi and others added 6 commits May 26, 2026 09:19
Move headCheckpointFlags, headHasReviewCheckpoint, and
headHasInvestigateCheckpoint out of review_helpers.go (an import-cycle
grab-bag) into head_checkpoint_flags.go so the existing
head_checkpoint_flags_test.go pairs with a matching source file per Go
convention. Pure move, no logic change: the functions stay in package
cli (checkpoint access can't live in the review/ subpackage without
cycling) and remain cross-feature — used by status plus the review and
investigate re-run guards.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Entire-Checkpoint: f31d2ea541b6
The file-level //nolint:ireturn,wrapcheck directive listed ireturn, but
ireturn never fires in this file, so nolintlint flagged the directive as
unused and CI failed. Keep wrapcheck (still needed for the osfs
passthrough methods) and drop ireturn.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Entire-Checkpoint: bdb7feab3ec4
Addresses four security findings on the path that feeds attacker-controlled
GitHub content into agents launched with permission/sandbox bypass.

1. Sanitize untrusted single-line metadata. Issue title, author login, and
   label names render outside the <untrusted> envelope (the title as a
   top-level heading), so an injected newline + "# SYSTEM:" could land as
   document structure. New sanitizeInline collapses control chars/newlines
   and neutralizes a leading markdown control char. Applied to title, author,
   labels, and comment authors.

2. Defang </untrusted> case-insensitively with whitespace tolerance. The
   exact-string match missed "</untrusted >", "</UNTRUSTED>", "</untrusted\t>",
   which an LLM may still read as a real closing tag. Replaced with a regex;
   the shared writeUntrustedBlock chokepoint also covers fix.go's re-wrap.

3. Refuse non-interactive --issue-link by default. The CI path (remote content
   + auto-approving agent + no human gate) was the most dangerous. Added
   --allow-untrusted-seed; without it a non-interactive run now refuses
   instead of silently proceeding.

4. Validate run IDs at the source. A planted manifest with run_id "../../.."
   flowed through clean → RunDir → os.RemoveAll. List() now skips manifests
   whose run_id fails validateRunID and ResolveByRunID ignores invalid
   entries, so no unvalidated id can reach RunDir. RunDir's precondition is
   documented.

Tests cover sanitizeInline, adversarial title/author/label rendering, close-tag
variant defang, the non-interactive refuse/opt-in branches, and the manifest
run-ID filtering.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Entire-Checkpoint: a2ace8be65d1
…ue-link test

TestInvestigate_IssueLink_ResolvesViaFakeGh runs `entire investigate
--issue-link` via execx.NonInteractive (no TTY). The strict default added in
10e58ee now refuses such runs without --allow-untrusted-seed, so the test
exited 1. The test consciously opts in — it is exercising the issue-link
resolution + bootstrap path, not the refusal gate (covered by unit tests).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Entire-Checkpoint: dd4053c251ee
Soph
Soph previously approved these changes May 27, 2026
Resolved conflict in cmd/entire/cli/gitrepo/alternates_fs.go: both sides
dropped the unused ireturn nolint directive; kept main's more detailed
wrapcheck rationale (functionally identical directive).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: a64caa818a4f
@alishakawaguchi alishakawaguchi merged commit 42f10f1 into main May 27, 2026
9 checks passed
@alishakawaguchi alishakawaguchi deleted the entire-labs-investigate branch May 27, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants